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This project explores the use of machine learning techniques to accurately predict travel times 
in city streets and highways using floating car data (location information of user vehicles on a road 
y—i network). The aim of this report is twofold, first we present a general architecture of solving this 

problem, then present and evaluate few techniques on real floating car data gathered over a month 
on a 5 Km highway in New Delhi. 

O 

Q 1 Floating Car Data Based Traffic Estimation 

Data used to estimate the travel times on road networks come in two varieties, one being fixed 
sensors on the side of the road such as magnetometer detectors or highway cameras [5, 6]. The 
second method is fioating car data (FCD). Floating car data are position fixes of vehicles traversing 
city streets throughout the day. The most common type of FCD comes from taxi's or delivery 
^ 1 vehicles which are on main arterial roads and highways throughout most of the day. 

^ This second approach has many positive and negative attributes that must be dealth with to 

^ provide accurate travel time inference. First, FCD is perhaps the most inexpensive data to attain, 

' since many taxi services, and delivery companies automatically gather this data on their vehicles 

^ for logistic purposes. Second, position fixes are generally very accurate, since GPS is used and this 

has high accuracy. There are however many disadvantages as well. First, FCD is usually sampled 
^\ infrequenty, on the order of 2-3 minutes. The reason for this is that taxi or delivery companies do 

not need such fine time granularity of their vehicles position. Therefore quiet a bit of preprocessing 
needs to take place in order to "snap" sets of points onto the proper streets with the possibility 
that multiple paths might have lead to the same pair of traverse points. Another disadvantage of 
this method is that a high density of data is required to get meaningfuU travel time predictions 
for a given road network. Keeping all this in mind, constructing more and more accurate travel 
I time predictions can be a fruitful Algorithms/Machine Learning/Statistical Modelling problem with 

• • various problems to tackle. 

> 

^ 2 Prediction Architecture 

Building a traflic estimation, system using millions of incoming FCD streams is a computational 
and algorithmic challenge. Various subcomponents requiring significant research work need to be 
developed and integrated. Here we give a breif overview of the various processing steps required as 
well as the issues explored in this report. Figure (pi presents a high level view of such a system. 



1 



2 Prediction Architecture 
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Fig. 1: Travel Time Prediction Architecture: For single user, input is sequence of 

{tlXiyi} ... {tNXNUN}- 



3 Preprocessing 
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3 Preprocessing 

3.1 Motion Detection 

Since the incoming streams of traffic data come from taxi cabs, delivery vehicles, or other comercial 
fleet vehicles, there will always be a certain amount of ambiguity between a slowdown in traffic 
and a commercial stop by the vehicle, (i.e. for a taxi customer, or a delivery vehicle dropping off 
packages). Therefore, any further processing must clean out all unwanted stops that are in the GPS 
logs. The most common and inuitive technique, and that which is used in this project is to track 
the the number of consequtive GPS points that are less than a specified distance Dmax from each 
other. 

At each iteration a running tally Nt is kept for the number of GPS points that are within Dmax 
of each other. If Nt is greater than some threshold N^ax then the previous N^ax points are labeled 
as invalid. To compute the threshold parameters (NmaxD), a supervised learning algorithm is can 
technique. In live taxi data, the taxi driver will activate a counter to track the distance crossed 
by the vehicle and the time spent farying the customer accross town, this is used to construct the 
proper labels for motion and stopping of the taxi. If this secondary informaiton is not available, 
another option is to visually inpect a large data set of GPS data, and manually identify the regions 
where a stop has taken place. 



3.2 Map Matching 

Map Matching is a widely studied problem in transportation research is perhaps the most com- 
putaionally difficult and important subcomponent. The input is a time indexed sequence of GPS 
coordinates {tixiyi} ... {tNXNyN}i where ti is a standard unix timestamp data structure, and pairs 
{xiUi) represent latitude and longitude coordinate. Map matching attempts to efficiently and accu- 
rately match consecutive points to sequence of links representing road segments in a stardard city 

map. Standard map matching outputs, \ti^startti,End {ej,}„g£;^| ••■ |ijv,start ijv,B«rf {e,^}^^^^^ |, 

herejejjj^g^^ represents a set road segment links for the i*'' path inthegral. Also, Ei is the set of 



all indices representing such path edges. Figure (4.1 1 shows a simple example of map matching 



Map matching is done in a variety of ways depending on a tradeoff of accuracy and efficiency. 
For the task of realtime high capacity map matching, several techniques involving hueristic graph 
search are shown in [1, 2, 3]. These techniques follow a general strategy of greedily adding road 
segments to a solution set as points are processed, each candidate road segment receives a score by 
a distance function defined on road segments and GPS probes. A standard distance function is is 
given by finding the shortest distance between the GPS point and point on the road segment. 

Given a road segment defined by all convex combinations of two GPS coordinates, A and B. 
GA.B ~ {{x, y) : a E [0, 1] , {x, y) — aA + (1 — a) B} points inside the road segment are defined as, 
ca.b (ck) = aA + (1 — a) B. With the standard distance function used map matching algorithms is. 



d{{xy} , e) = I 



niindGPs{eA,B{a),{xy}) Z{{xy},A)<^ and Z({xy},B}<i 

min{dGPs{A,{xy}),dGPsiB,{xy})} else 



Where dgps (?'iP2)and Z(pi,p2) represent standard geographic distance and angle between 
two GPS coordinates. After the candidate pool grows to a specified size, pruning via a hueristic 
shrinks the candidate pool. 



4 Supervised learning for Traffic Estimation 
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Fig. 2: Example of map matching of set of GPS points. Input is set of points with outputted path 
integrals. 

4 Supervised learning for Traffic Estimation 

This report explores the use of supervised learning for traffic inference on links on a road network. 
Each technique relies on different modelling assumptions and produces different estimation results 
and has particular advantages and disadvantages that dictates use in individual situations. 

The most general form of learning and prediction is that after learning takes place, some model 
parameterized by 9 will be available. Some prediction function, F (^{^n}^^^^ ! will use the 
learned parameters, and output an estimate of the travel times through a new set of links {eJij^^g^ • 

4.1 Regression Technique: 

Regression techniques rely on an additive model for travel times. That is the travel time for a 
commuter traversing a set of links will be the sum of the travel times for the set of links traversed. 
Therefore if the current link travel times are already known, then the estimated travel time for 



some path through the network in figure (4.1 1 will the sum of the link travel times. 

Under a regression model, each ie (cj) = Oi therefore given link travel times, any observed travel 
time will follow 

2/i E„6is, te (e„) - N (E„eis. *e (e„) , a^) 

Y = xe + E 

Throughout this project, we make different assumptions on the observed travel times depending 
on whether long term averages, or short term deviations are being computed. 



4 Supervised learning for Traffic Estimation 
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4.1.1 Long Run Historical Travel Time 

A goal of traffic flow analysis is having long term historic data on flow volume through various 
links. To construct a regression model taking into account spatial variation of historic TT values, 
we can assume that adjacent links are distributed normally: 
i(e„)-i(e„_i)-iV(0, r^) n = {1, . . . , iV - 1} 

This enforces a penalty on sudden changes to the historic means, as well as makes estimation 
of parameters tractable when the maximum likelihood estimator is underdetermined. With this 
assumption we construct the following maximum a posteriori estimator. 

A = argmaxX;^ log P (?/'| E„eis * + Em log P [t (e„) - t (e,„_i)) 

t(e) 

= argminE^ ||y' - * (^n) + ^ IIEm * (^m) - ^ (cm-OHa 

t(e) 

= argmin||y-Xe|| + Ai \\D(d\\^ 


Which is simply a Ridge Regression or L2 Regression with smoothness penalty on the variation 
of historic mean along different links [6] . 



4.1.2 Short Term Incidence Detection on Live Data 

Given a short time window (on order of 1 hour) we assume the distribution on any given day for link 

t (ci) — 6i + A^where Oils a historical average and A^is random variable representing the deviation 

that day from the historical average. The important assumption here is that this deviation is 

distributed with a heavy tail. (i.e. no deviation with high probability and large deviation with low 

probability). To model this, we use a laplacian prior distrbution: p{Ai;a) — e'~l. 

Given a set of historic data, and a small number of measured daily measured paths, we can use 

this model to computer short term deviations from the historic values. 

A = argmaxE, log P {y'\ EneB *e (e„)) + Em ^o.9 ^ (^e (e™)) 
te(e) 

= argminEj ^ \\y' - E„eE, *e ien)\\^ + ^ J2rn ll^e iem)\\l 
te(e) 

= argmin \\Y - X (8 + A)||^ + A2 ||e + A||^ 

A 

Therefore in every small time window that live probe data arrives, an LI regression is performed 
estimate deviations from the historic travel times. Since probe volume is small compared to the 
number of links, this technique makes intuitive sense. When performing live prediction we assume 
that the parameters {Q, Ai, A2} are already computed. Therefore in each time window used for 
regression, the parameters {A}^ are computed and predictions are given by Ens-E ~^ ^n)- 



4.1.3 Computing Model Parameters 

In modelling travel time distributions, we explicitley separate long term and short term behaviour to 
easily estimate the model parameters. With this, parameter estimation is performed in two stages. 
First, a subset of the training data is aggregated and crossvalidation is performed to determine Q 
and appropriate Ai. In the second stage, the short term model parameter A2 is computed with 
similar cross validation with training data (See note (U])). This data processing model is shown in 



figure (4.1.3). This is a visual interpretation of how data processing takes place. Each block is a 
tuple of paths and travel times (i.e. X, and Y from figure (|4.1|). 



5 Experimental Results 
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Fig. 3: Data Processing model for estimating historic means for link travel times, as well as LI 
regression parameters. Test data used daily to compute short term deviations from historic 
average. 



Tab. 1: Input Data Statistics: 4/5/2008 - 4/26/2008 (22 days, 8-9 AM) 



min 


max 


mean 


std 




34 


110 


66.3 


27.5 


Raw Data in Sector 


11 


65 


26.8 


13.6 


Processed Path Integrals 



El 

4.2 Travel Time Median Backprojection: 

The second general technique used in travel time estimation is to merely backproject the travel 
time distances across each traversed link weighted proportionally to the lenght of the links. After 
many paths are processed, each link will map to a list of backprojected travel time values. The 
travel time estimate of the link is simply the median over the distribution. We use this algorithm 
as a naive baseline to compare the more sophisticated model. 



5 Experimental Results 
5.1 Preprocessing Results 

Components of the system architecture were implemented according to figure ^ to process fioating 
car data from approximately 40 Taxi's over the course of about one month in urban New Delhi. 
The total dataset of GPS logs span 22 days: dates 4/5/2008 to 4/26/2008. The data was sampled 
infrequently at about 2-3 minutes. The reason for this is that some GPS devices sampled every 2 
minutes while some sampled every 3 minutes. Since the focus of this report is the core inference 



techniques, a single 5 Km highway (figure (5.1 1) was chosen for analysis instead of a region of the 
city, or the entire city, instead of an entire city. Another reason why a long highway was chosen 
is that the fundemental assumption that link travel times can be added to estimate the combined 
travel time is a more accurate model for highways than for city arterial roads [4] . 



^ Recall that this separation between long term historic data and short term fluctuations is an assumption we 
explicitly make. A more thorough model would merely include daily laplacian deviations into the first model. 
However comstructing an estimator for this latent variable model, requires an EM algorithm. This is a definately a 
future extension of our work. 



5 Experimental Results 
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Fig. 4: Single road learning experiment. 




Fig. 5: Track Creation process - Primary reprocessing step. 



The data included all FCD logs that were contained in the geospatial coordinates shown in figure 



(5.1 1. Therefore much of the data belonged to taxi's that might have traversed nearby streets. To 
isolate the data required for inference of the highlighted street, full map matching on the data was 
implement and used to seperate FCD data traversing nearby streets. Additionally, stop detection 
as described earlier was used to further remove artifacts that would cause false positives in incident 
detection. Statistics on the raw input data and final post processed data are given in table (5.11. 
For this project, additional processing was done to remove any artifacts caused by missed detections 
of taxi stops. From table (5.11 it is apparent that a significant proportion of the data contained 
stops or belonged to paths on nearby streets. 

Figure (5.11 shows a sample of the processing done. The each green point was determined an 
acceptable path integral end point, while red points were rejected. The dashed line indicates the 
start and end of each path integral. All logs for this example were sampled uniformly. By visual 
inspection, one can infere that the upper left quadrant will have much lower travel times, than the 
lower right portion of the street. 
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Historic Link travel times 
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Tab. 2: Main Results comparing 3 algorithms. 



Technique 


Error Rate (% of True) 


Std Dev. 


N 


95 % Confidence Interval 


Historical Average 


26.82 % 


.10 


234 


[25.53, 28.1] 


Historical Average And Incidence Detection 


22.1 % 


.12 


234 


[20.56, 23.6] 


Median Backproject 


32.4 % 


.14 


234 


[30.60, 34.19] 



5.2 Inference Results 



Supervised learning of post processed data follows the model presented in figure (4.1.3). Here, 
^train blocks (cach representing one day's paths) were chosen to learn the historical travel time 
model. From the entire A^jrain ^ fold cross validation was used with varying model parameter Ai 
to estimate the optimal historical travel time 8. That is the Ai yielding the lowest cross validation 



error for the data was chosen as the optimal parameter. Figure (5.2) shows the results from cross 
validation; the historical travel times, optimal Aiand minimum training error. Cross validation 
error for training historic travel times yielded an optimal Ai = 24.45 and MSEmin = 22%. 

Fitting the parameters for the daily incidence model follows an identical methodology as for the 
long term historic mean estimation. Nf^^^^ days of data are used, wher for each day, leave one out 
cross validation (since the number of path integrals is on the order of 10 - 15) is performed where 
parameter A is estimated with the training data and used to predict the travel time of the one out 
path integral/TT. As before, A2 is varied and an optimal value can be determined by looking at 
the lowest test error. 



5.3 Test Prediction 

Finally with the model parameters computed, the incident detection technique is tested as follows: 
All data not used in the first two phases {Ntest) is tested for each individual day, whereby all but 
one of the path integrals is assumed to be the day's probe data (i.e. leave one out model validation) 
. This data is used to to find the sparse deviations from the historic mean {A} by solving the 
LASSO problem presented earlier. The deviations are then incorporated with the historic mean to 
provide the predicted travel time. 



6 Conclusions and Future Directions 
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In this report we tested 3 separate algorithms to compare their results. Multiple random assign- 
ments were made between the 22 files and the subsets used for training and testing (-/V^^rami ^traim ^test)- 
First we merely use the historic means as a default estimate, without updating the link travel times 
by any live data. Second we apply the incidence detection technique introduced in the paper. Fi- 
nally we apply the naive median backproject technique. Table (5.3) summarizes the final computed 
test errors for the different techniques discussed. The results show an improvement of incidence 
detection over merely using the historical average via regularized least square. The worst tech- 
nique was the median backproject. The error values are in percentage of true link travel time. A 
simple statistical analysis of the incidence detection results shows that the improvement is statisti- 
cally significant. Under 95% confidence interval there is a statistically significant difference in each 
algorithms results. 



6 Conclusions and Future Directions 



This report presents a unified architecture for Floating Car Data based travel time inference as well 
as proposes an incidence based inference technique that is evaluated and shown to be quiet effective 
at estimating travel times on various links in transportation networks. The largest impediment in 
achieving more accuracy appears to be the amount of data required. The road network was chosen 
since from the data available, it had the highest density of FCD. Regardless, at the rush hour time 
of 8-9 AM an average of 26 or so path integrals were constructed. Therefore future tests require a 
higher density of FCD. Also, a unified EM algorithm (mentioned in the footnotes) for training the 
historical data, while incorporating the sparse deviations model should be investigated. 
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